Semi-supervised Hierarchical Clustering Analysis for High Dimensional Data

نویسندگان

  • Yuntao Qian
  • Xiaoxu Du
  • Qi Wang
چکیده

In many data mining tasks, there is a large supply of unlabeled data but limited labeled data since it is expensive generated. Therefore, a number of semi-supervised clustering algorithms have been proposed, but few of them are specially designed for high dimensional data. High dimensionality is a difficult challenge for clustering analysis due to the inherent sparse distribution, and most of popular clustering algorithms including semi-supervised ones will be invalid in high dimensional space. In this paper, a semi-supervised hierarchical clustering algorithm for high dimensional data is proposed, which is based on the combination of semisupervised clustering and dimensionality reduction. In order to achieve high harmony between dimensionality reduction and inherent cluster structure detection, the number of dimensions is reduced sequentially as the clusters are gradually formed in the hierarchical clustering procedure. The experimental results show the effectiveness of our method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

Semi-supervised dimensionality reduction using orthogonal projection divergence-based clustering for hyperspectral imagery

Band clustering and selection are applied to dimensionality reduction of hyperspectral imagery. The proposed method is based on a hierarchical clustering structure, which aims to group bands using an information or similarity measure. Specifically, the distance based on orthogonal projection divergence is used as a criterion for clustering. After clustering, a band selection step is applied to ...

متن کامل

Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering

Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...

متن کامل

The Graduate School SEMI - SUPERVISED CLUSTERING FOR HIGH - DIMENSIONAL AND SPARSE FEATURES

Clustering is one of the most common data mining tasks, used frequently for data organization and analysis in various application domains. Traditional machine learning approaches to clustering are fully automated and unsupervised where class labels are unknown a priori. In real application domains, however, some “weak” form of side information about the domain or data sets can be often availabl...

متن کامل

Semi-supervised learning via penalized mixture model with application to microarray sample classification

MOTIVATION It is biologically interesting to address whether human blood outgrowth endothelial cells (BOECs) belong to or are closer to large vessel endothelial cells (LVECs) or microvascular endothelial cells (MVECs) based on global expression profiling. An earlier analysis using a hierarchical clustering and a small set of genes suggested that BOECs seemed to be closer to MVECs. By taking adv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006